home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group93c.txt
/
000022_icon-group-sender _Wed Jul 21 09:13:07 1993.msg
< prev
next >
Wrap
Internet Message Format
|
1994-02-02
|
3KB
Received: by cheltenham.cs.arizona.edu; Wed, 21 Jul 1993 08:44:20 MST
Date: Wed, 21 Jul 93 09:13:07 CDT
From: "Richard L. Goerwitz" <goer@midway.uchicago.edu>
Message-Id: <9307211413.AA01333@midway.uchicago.edu>
To: icon-group@cs.arizona.edu
Subject: Icon for large-scale stuff
Status: R
Errors-To: icon-group-errors@cs.arizona.edu
>Some time ago I re-posted a message from the Linguist List having to do
>with a linguistic software initiative. I had wondered if (and suggested
>that) Icon might become more 'popular' by showing utility in such an
>initiative.
>One reply claimed that Icon would not have such utility since it could not
>manage the massive amounts of data/calculations that are often required in
>linguistic work.
Sounds pretty pessimistic. In fact, I've been using Icon successfully
on large corpora for several years now. Naturally there are some things
Icon does not do well. Despite the natural tendency to be lazy, one
really does have to maintain facility with several languages, both high
and low-level, in order to be construct NLP tools quickly that can do
the job.
One thing to remember is that NL stuff often involves constructing human-
machine interfaces. The software only has to be fast enough to map NL
queries to some primitive set of instructions. This can be done in real
time using Icon-based tools. NLP also may involve processing large cor-
pora in batch mode. Again, although Icon will not do this sort of thing
as quickly as C, it's certainly no worse than LISP or Prolog, and these
are two of the main languages used for such batch processing. The idea
is that it doesn't matter if the batch processing finishes in one minute
or ten if it's really done in batch mode. Response time is a problem
more for interactive systems.
One final note I might inject here is that Icon performs perfectly well
for multi-megabyte databases, especially under "real" operating systems
with sensible file systems. If you want evidence of this, ftp my silly
"Bibleref" program from cs.arizona.edu. This program can find a passage
requested by the user, decompress it, and display it in just a few sec-
onds. It can perform word searches almost as quickly (unless you are
looking for something like "the"). If I'd skipped the compression and
decompression phases by using a straight, human-readable database, then
the delays would have been even less.
I confess that there are certain projects I really wouldn't do in Icon.
I discovered one seemingly complex phenomenon in one text to be, in fact,
*almost* right linear, and decided that the natural parsing tool to use
would be YACC + Lex. Still, it is flatly wrong to claim that Icon would
not be useful as a linguistic research tool. The proof is in the pudding,
and I have a lot of pudding on hand.
-Richard Goerwitz
goer@midway.uchicago.edu